Objective

The objective of this notebook is:

  • To document research and implementation of Principal Component Analysis for image compression

  • To demonstrate image compression using PCA

Research

Principal Component Analysis is a way to identify and express patterns in data so as to highlight similarities and differences within it.

This is extremely important for higher dimensional data which can not be visualized to identify patterns.

After the pattern is identified, it can be used to compress the data to a lower dimension without much loss of information. This property of PCA is used for image compression.

PCA is based on eigendecomposition of covariance matrix of the data.

Covariance

Covariance is a measure of the extent to which corresponding elements from two sets of ordered data move in the same direction.

Let \(X\) and \(Y\) be two random variables, then covariance is computed as:

\[ Cov(X, Y) = \frac{\sum_n{E[(X - E[X])(Y - E[Y])]}}{n-1} \]

where \(n\) is the number of items in the set

Covariance Matrix

Covariance values for a set of variables are displayed as a covariance matrix, \(C\), where

\[ C = \begin{bmatrix} c_{1, 1} & c_{1, 2} & c_{1, 3} & ... & c_{1, n}\\ c_{2, 1} & c_{2, 2} & c_{2, 3} & ... & c_{2, n}\\ c_{3, 1} & c_{3, 2} & c_{3, 3} & ... & c_{3, n}\\ \vdots & \vdots & \vdots & ... & \vdots\\ c_{n, 1} & c_{n, 2} & c_{n, 3} & ... & c_{n, n}\\ \end{bmatrix} \]

where \(c_{i, j}\) is given as

\[ c_{i, j} = Cov(X_{i}, X_{j}) \]

Note: The diagonal elements, \(c_{i, i}\) give \(Var(X_{i})\)

Note: Covariance matrix is symmetric as \(c_{i, j} = c_{j, i}\)

Eigendecomposition

In linear algebra, eigendecomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors.

A vector \(v\) of dimension \(N\) is an eigenvector of a square \(N × N\) matrix \(A\) if it satisfies the linear equation

\[ A\vec{v} = \lambda \vec{v} \] where \(\lambda\) is any scalar value and corresponds to the eigenvalue for the respective eigenvector

Note: \(\vec{v}\) needs to be a non-zero vector

Geometrically, this equation implies that for any matrix \(A\), \(\vec{v}\) represents any vector which, when linearly transformed using \(A\), retains its direction but is scaled by a factor of \(\lambda\).

PCA Approach

PCA is performed using the following steps:

  • Step 1: Represent the data as a \(M X N\) matrix, where \(M\) are the number of samples and \(N\) are the number of features (dimensions)

  • Step 2: Center the data by subtracting the mean of each feature across samples from the value of that feature for each sample

  • Step 3: Calculate covariance matrix for the centered data

  • Step 4: Perform eigendecomposition on the covariance matrix

  • Step 5: Arrange the eigenvectors and eigenvalues in decreasing order of eigenvalues

  • Step 6: Select top \(l\) eigenvectors based on eigenvalues where \(l < N\)

Image Compression

After selecting top \(l\) eigenvectors based on eigenvalues of the covariance matrix of data, they can be used to compress images as follows:

  • Step 1: Perform matrix multiplication of original data with selected eigenvectors to get lower dimension representation of data

  • Step 2: Reconstruct original data using matrix multiplication of lower dimension data with inverse of matrix of eigenvectors

The reconstructed data will be a compressed version of original data

Analysis

In this section we will demonstrate PCA based image compression on a real image.

Load Image

We are using jpeg::readJPEG() function to load image from local directory.

file_path <- "./sample_images/nishit_jain.jpeg"
loaded_image <- jpeg::readJPEG(file_path, native = FALSE)
plotImage(list(loaded_image))

Image Properties

We are looking at the size and the dimensions of the original image.

  • Size of original image is 50355 bytes.

  • Dimensions of the image are: 400 X 400 X 3

  • This image can be thought of as having 400 samples of 400 dimensions each with 3 different compression tasks.

Image Channels

Since we have 3 channels (R, G, B), we will perform PCA for each channel individually.

Principal Component Analysis

In this section, we will use PCA to extract new dimensions from the image and plot their importance.

# Extract 3 channels
loaded_image_r <- loaded_image[, , 1]
loaded_image_g <- loaded_image[, , 2]
loaded_image_b <- loaded_image[, , 3]

While performing PCA, we will not center the value for each pixel. This will enable us to reconstruct the original image after compression.

# PCA on each channel
loaded_image_r_pca <- stats::prcomp(loaded_image_r, center=FALSE)
loaded_image_g_pca <- stats::prcomp(loaded_image_g, center=FALSE)
loaded_image_b_pca <- stats::prcomp(loaded_image_b, center=FALSE)

Scree Plot

It is used to plots the order of the percentage of explained variance of the original data by each principal component.

Red Channel

Green Channel

Blue Channel

Image Compression

The size of compressed image is decided by the number of principal components chosen for compression.

Less number of principal components results in lower size of image but involves more loss of information.

Number of principal components can be chosen based on percentage of variance explained.

90% Explained Variance

explained_variance <- 90.00

components <- list()
components[[1]] <- e_values_r %>% 
  dplyr::filter(cumulative.variance.percent < explained_variance) %>% 
  nrow()
components[[2]] <- e_values_g %>% 
  dplyr::filter(cumulative.variance.percent < explained_variance) %>% 
  nrow()
components[[3]] <- e_values_b %>% 
  dplyr::filter(cumulative.variance.percent < explained_variance) %>% 
  nrow()

compressed_image <- sapply(1:length(loaded_image_pca), function(channel_index) {
  channel_pca <- loaded_image_pca[[channel_index]]
  n_components <- components[[channel_index]]
  
  compressed_channel <- channel_pca$x[, 1:n_components] %*% t(channel_pca$rotation[, 1:n_components])
  compressed_channel[compressed_channel > 1] <- 1
  compressed_channel[compressed_channel < 0] <- 0
  return(compressed_channel)
}, simplify = 'array')

jpeg::writeJPEG(compressed_image, paste0("./tmp/compressed_", explained_variance, ".jpeg"))

plotImage(list(compressed_image))

  • Size of original image is 50355 bytes.

  • Size of compressed image is 9940 bytes.

  • Compression percentage: -80.26%

95% Explained Variance

explained_variance <- 95.00

components <- list()
components[[1]] <- e_values_r %>% 
  dplyr::filter(cumulative.variance.percent < explained_variance) %>% 
  nrow()
components[[2]] <- e_values_g %>% 
  dplyr::filter(cumulative.variance.percent < explained_variance) %>% 
  nrow()
components[[3]] <- e_values_b %>% 
  dplyr::filter(cumulative.variance.percent < explained_variance) %>% 
  nrow()

compressed_image <- sapply(1:length(loaded_image_pca), function(channel_index) {
  channel_pca <- loaded_image_pca[[channel_index]]
  n_components <- components[[channel_index]]
  
  compressed_channel <- channel_pca$x[, 1:n_components] %*% t(channel_pca$rotation[, 1:n_components])
  compressed_channel[compressed_channel > 1] <- 1
  compressed_channel[compressed_channel < 0] <- 0
  return(compressed_channel)
}, simplify = 'array')

jpeg::writeJPEG(compressed_image, paste0("./tmp/compressed_", explained_variance, ".jpeg"))

plotImage(list(compressed_image))

  • Size of original image is 50355 bytes.

  • Size of compressed image is 13874 bytes.

  • Compression percentage: -72.45%

99% Explained Variance

explained_variance <- 99.00

components <- list()
components[[1]] <- e_values_r %>% 
  dplyr::filter(cumulative.variance.percent < explained_variance) %>% 
  nrow()
components[[2]] <- e_values_g %>% 
  dplyr::filter(cumulative.variance.percent < explained_variance) %>% 
  nrow()
components[[3]] <- e_values_b %>% 
  dplyr::filter(cumulative.variance.percent < explained_variance) %>% 
  nrow()

compressed_image <- sapply(1:length(loaded_image_pca), function(channel_index) {
  channel_pca <- loaded_image_pca[[channel_index]]
  n_components <- components[[channel_index]]
  
  compressed_channel <- channel_pca$x[, 1:n_components] %*% t(channel_pca$rotation[, 1:n_components])
  compressed_channel[compressed_channel > 1] <- 1
  compressed_channel[compressed_channel < 0] <- 0
  return(compressed_channel)
}, simplify = 'array')

jpeg::writeJPEG(compressed_image, paste0("./tmp/compressed_", explained_variance, ".jpeg"))

plotImage(list(compressed_image))

  • Size of original image is 50355 bytes.

  • Size of compressed image is 20263 bytes.

  • Compression percentage: -59.76%

99.99% Explained Variance

explained_variance <- 99.99

components <- list()
components[[1]] <- e_values_r %>% 
  dplyr::filter(cumulative.variance.percent < explained_variance) %>% 
  nrow()
components[[2]] <- e_values_g %>% 
  dplyr::filter(cumulative.variance.percent < explained_variance) %>% 
  nrow()
components[[3]] <- e_values_b %>% 
  dplyr::filter(cumulative.variance.percent < explained_variance) %>% 
  nrow()

compressed_image <- sapply(1:length(loaded_image_pca), function(channel_index) {
  channel_pca <- loaded_image_pca[[channel_index]]
  n_components <- components[[channel_index]]
  
  compressed_channel <- channel_pca$x[, 1:n_components] %*% t(channel_pca$rotation[, 1:n_components])
  compressed_channel[compressed_channel > 1] <- 1
  compressed_channel[compressed_channel < 0] <- 0
  return(compressed_channel)
}, simplify = 'array')

jpeg::writeJPEG(compressed_image, paste0("./tmp/compressed_", explained_variance, ".jpeg"))

plotImage(list(compressed_image))

  • Size of original image is 50355 bytes.

  • Size of compressed image is 26909 bytes.

  • Compression percentage: -46.56%

100% Explained Variance

compressed_image <- sapply(1:length(loaded_image_pca), function(channel_index) {
  channel_pca <- loaded_image_pca[[channel_index]]
  n_components <- e_values_r %>% nrow()
  
  compressed_channel <- channel_pca$x[, 1:n_components] %*% t(channel_pca$rotation[, 1:n_components])
  compressed_channel[compressed_channel > 1] <- 1
  compressed_channel[compressed_channel < 0] <- 0
  return(compressed_channel)
}, simplify = 'array')

jpeg::writeJPEG(compressed_image, paste0("./tmp/compressed_", explained_variance, ".jpeg"))

plotImage(list(compressed_image))

  • Size of original image is 50355 bytes.

  • Size of compressed image is 26935 bytes.

  • Compression percentage: -46.51%